42 research outputs found

    Towards Structural Classification of Proteins based on Contact Map Overlap

    Get PDF
    A multitude of measures have been proposed to quantify the similarity between protein 3-D structure. Among these measures, contact map overlap (CMO) maximization deserved sustained attention during past decade because it offers a fine estimation of the natural homology relation between proteins. Despite this large involvement of the bioinformatics and computer science community, the performance of known algorithms remains modest. Due to the complexity of the problem, they got stuck on relatively small instances and are not applicable for large scale comparison. This paper offers a clear improvement over past methods in this respect. We present a new integer programming model for CMO and propose an exact B &B algorithm with bounds computed by solving Lagrangian relaxation. The efficiency of the approach is demonstrated on a popular small benchmark (Skolnick set, 40 domains). On this set our algorithm significantly outperforms the best existing exact algorithms, and yet provides lower and upper bounds of better quality. Some hard CMO instances have been solved for the first time and within reasonable time limits. From the values of the running time and the relative gap (relative difference between upper and lower bounds), we obtained the right classification for this test. These encouraging result led us to design a harder benchmark to better assess the classification capability of our approach. We constructed a large scale set of 300 protein domains (a subset of ASTRAL database) that we have called Proteus 300. Using the relative gap of any of the 44850 couples as a similarity measure, we obtained a classification in very good agreement with SCOP. Our algorithm provides thus a powerful classification tool for large structure databases

    Solving Maximum Clique Problem for Protein Structure Similarity

    Get PDF
    A basic assumption of molecular biology is that proteins sharing close three-dimensional (3D) structures are likely to share a common function and in most cases derive from a same ancestor. Computing the similarity between two protein structures is therefore a crucial task and has been extensively investigated. Evaluating the similarity of two proteins can be done by finding an optimal one-to-one matching between their components, which is equivalent to identifying a maximum weighted clique in a specific "alignment graph". In this paper we present a new integer programming formulation for solving such clique problems. The model has been implemented using the ILOG CPLEX Callable Library. In addition, we designed a dedicated branch and bound algorithm for solving the maximum cardinality clique problem. Both approaches have been integrated in VAST (Vector Alignment Search Tool) - a software for aligning protein 3D structures largely used in NCBI (National Center for Biotechnology Information). The original VAST clique solver uses the well known Bron and Kerbosh algorithm (BK). Our computational results on real life protein alignment instances show that our branch and bound algorithm is up to 116 times faster than BK for the largest proteins

    N–Dimensional Orthogonal Tile Sizing Problem

    Get PDF
    AMS subject classification: 68Q22, 90C90We discuss in this paper the problem of generating highly efficient code when a n + 1-dimensional nested loop program is executed on a n-dimensional torus/grid of distributed-memory general-purpose machines. We focus on a class of uniform recurrences with non-negative components of the dependency matrix. Using tiling the iteration space strategy we show that minimizing the total running time reduces to solving a non-trivial non-linear integer optimization problem. For the later we present a mathematical framework that enables us to derive an O(n log n) algorithm for finding a good approximate solution. The theoretical evaluations and the experimental results show that the obtained solution approximates the original minimum sufficiently well in the context of the considered problem. Such algorithm is realtime usable for very large values of n and can be used as optimization techniques in parallelizing compilers as well as in performance tuning of parallel codes by hand

    Lagrangian Approaches for a class of Matching Problems in Computational Biology

    Get PDF
    This paper presents efficient algorithms for solving the problem of aligning a protein structure template to a query amino-acid sequence, known as protein threading problem. We consider the problem as a special case of graph matching problem. We give formal graph and integer programming models of the problem. After studying the properties of these models, we propose two kinds of Lagrangian relaxation for solving them. We present experimental results on real life instances showing the efficiency of our approaches

    Flexible Alignments for Protein Threading

    Get PDF
    We present a new local alignment method for the protein threading problem. Local sequence-sequence alignments are widely used to find functionally important regions in families of proteins. However, to the best of our knowledge, no local sequence-structure alignment algorithm has been described in the literature. Here we model local alignments as Mixed Integer Programming (MIP) models. These models permit to align a part of a protein structure onto a protein sequence in order to detect local similarities. The paper describes two MIP models, compares and analyzes their performance by using ILOG CPLEX 10 solver

    Comparing Protein 3D Structures Using A_purva

    Get PDF
    Structural similarity between proteins provides significant insights about their functions. Maximum Contact Map Overlap maximization (CMO) received sustained attention during the past decade and can be considered today as a credible protein structure measure. We present here A_purva, an exact CMO solver that is both efficient (notably faster than the previous exact algorithms), and reliable (providing accurate upper and lower bounds of the solution). These properties make it applicable for large-scale protein comparison and classification. Availability: http://apurva.genouest.org Contact: [email protected] Supplementary information: A_purva's user manual, as well as many examples of protein contact maps can be found on A_purva's web-page.La similarité structurale entre protéines donne des renseignements importants sur leurs fonctions. La maximisation du recouvrement de cartes de contacts (CMO) a reçu une attention soutenue ces dix dernières années, et est maintenant considérée comme une mesure de similarité crédible. Nous présentons içi A_purva, un solveur de CMO exacte qui est à la fois efficace (plus rapide que les autres algorithmes exactes) et fiable (fournit des bornes supérieures et inférieures précises de la solution). Ces propriétés le rendent applicable pour des comparaisons et des classifications de protéines à grandes échelles. Disponibilité : http://apurva.genouest.org Contact : [email protected] Informations supplémentaires : Le manuel utilisateur d'A_purva, ainsi que de nombreux exemples de cartes de contacts de protéines sont disponibles sur le site web d'A_purva

    Modèle de PLNE pour la recherche de cliques de poids maximal

    Get PDF
    National audienceEstimating the similarity of two protein structures is a very important task in biology. It is usually based on an alignment, i.e. a one to one matching between the amino-acids of each protein. Between all the methods for aligning proteins we are interested in VAST, which first aligns the secondary structures (SSE) and then extends this alignment to the amino-acids. The SSEs alignment is presented as a maximum clique problem in a particular graph. In this paper we propose a new integer programming model for various maximum weight clique problems and we successfully applied it in VAST

    The Protein Threading Problem is in P?

    Get PDF
    This work is about a problem from computational biology known as protein threading problem. By finding out an appropriate linear mixed-integer programming (MIP) formulation we demonstrate that the real-live instances of this problem could be efficiently solved by using only some linear-programming (LP) solver instead of special-purpose branch&bound algorithm. This is due to the fact that within the frame of MIP model proposed, all biological instances, we were able to test, attain their optima at feasible vertices of the underlying LP polytope which is the essence of the statement in the title

    A Novel Algorithm for Finding Maximum Common Ordered Subgraph

    Get PDF
    In this paper, we study the following problem: given are adjacency matrices of two simple graphs. Find two principal matrices (though they are vectors) having the maximum inner product. When used for computing the similarity of two protein structures this problem is called contact map overlap and for the later, we give an exact B&B algorithm with bounds computed by solving Lagrangian relaxation of the problem. The efficiency of the approach is demonstrated on a popular benchmark set of instances together with a comparison with the best existing algorithm
    corecore